Fault Tolerant Scheduling for Parallel Loops on Shared Memory Systems

نویسندگان

  • Yizhuo Wang
  • Rosario Cammarota
  • Alexandru Nicolau
چکیده

While multicore/multiprocessor systems achieve significant speedup for many applications by exploiting loop level parallelism, they also suffer from increased reliability problems as a result of ever scaling device size. This paper addresses the reliability of loop dominated applications, aiming to execute parallel loops efficiently in the presence of various types of hardware faults. In this paper, we present a fault tolerant work-stealing scheme which makes parallel loop execution resilient to hardware faults. A lightweight buffer-commit mechanism is applied in the proposed scheme to ensure the correctness of the re-execution of loop iterations. In addition, we split large failing chunks of loop iterations at runtime to improve load balancing, and a worker thread is discarded when faults occur frequently on it. We evaluated our techniques on a multi-socket multicore system, using a set of loop dominated benchmarks. The proposed scheme achieves the minimum overhead of supporting fault tolerance and optimal load balancing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach

One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Many researchers are using conventional techniques such as RPC, DSM, replication, causal communications and other techniques to provide parallel computing facilities on workstation ne...

متن کامل

Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems

Using runtime information of load distributions and processor affinity, we propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving the execution performance of parallel loops by making scheduling decisions that match the real work...

متن کامل

Executing Nested Parallel Loops on Shared-Memory Multiprocessors

Cache-coherent, bus-based shared-memory multiprocessors are a cost-e ective platform for parallel processing. In scienti c parallel applications, most of the computation involves processing of large multidimensional data structures which results in a high degree of data parallelism. This parallelism can be exploited in the form of nested parallel loops. Most existing shared memory multiprocesso...

متن کامل

TS-PVM: a fault tolerant PVM extension for real time applications

In this research work, a fault tolerant extension of the de facto message passing system parallel virtual machine, TSparallel virtual machine, is introduced. This extension enables real time applications over parallel virtual machine. In PVM and similar message passing systems, if the message receiver is not available, due to network failure or machine crash, a failure is reported and data must...

متن کامل

EXECUTING NESTED PARALLEL LOOPS ON SHARED - MEMORYMULTIPROCESSORSSadun

Cache-coherent, bus-based shared-memory multiprocessors are a cost-eeective platform for parallel processing. In scientiic parallel applications, most of the computation involves processing of large multidimensional data structures which results in a high degree of data parallelism. This parallelism can be exploited in the form of nested parallel loops. Most existing shared memory multiprocesso...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Inf. Sci. Eng.

دوره 31  شماره 

صفحات  -

تاریخ انتشار 2015